-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AWQ models #1049
base: main
Are you sure you want to change the base?
Support AWQ models #1049
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@mvafin, can we have a test for this, e.g. with "hf-internal-testing/Mixtral-tiny-AWQ" or any other dummy model? |
yes a test would be great |
@AlexKoff88 @IlyasMoutawwakil Test added |
@mvafin, some of the tests failed:
Can you please check on your side? |
Fixed |
still getting awq cuda errors |
That might be because optimum is not tested with 2024.6 version of openvino |
enable awq export only if ov support it
* fix style * disable autogptq and autoawq install for old transformers testing
- if: ${{ matrix.transformers-version == 'latest' && matrix.test-pattern == '*modeling*'}} | ||
name: Install auto-gptq, autoawq | ||
run: | | ||
pip install auto-gptq autoawq --extra-index-url https://download.pytorch.org/whl/cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not valid extra urls for auto-gptq and awq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed for preventing reinstalling torch with cuda during installing third-party, packages themselves should be installed from regular source, torch-dependent libs (the difference from --index-url and --extra-index-url that first redefine source index completely, the second one parameter used for usage index URL as additional source if library present in that source) will be tried to install from torch cpu url
The PR in its current form seems to degrade the user experience greatly. My understanding is that it suggests we pass the responsibility of patching autoawq and autogptq to the user ? like writing their own |
these manual patches requires only for running original torch model without cuda in test environment, general flow for model conversion and inference with openvino does not require these patches, so no any impact on optimum-intel user experience as running original model in torch is out of our responsibility. We already have similar stuff for conversion gptq models, why is it becoming problem? |
Thanks for the explanation ! I thought it was also being used for the quantization process on cpu but I see that patch is still in its place. |
supported_quant_methods = ["gptq"] | ||
if is_openvino_version(">=", "2024.6.0"): | ||
supported_quant_methods.append("awq") | ||
do_gptq_patching = quantization_config and quantization_config["quant_method"] in supported_quant_methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why patch auto-gptq
lib when method is awq
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separated gptq specific patching
# quantized models have higher tolerance | ||
if "awq" in model_arch: | ||
atol = 1e-2 | ||
elif "gptq" in model_arch: | ||
atol = 0.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these logits can't be considered "allclose" if this is the atol imo. does generation returns the same output ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly it is an issue with model itself, let me check
looks like installiing auto-awq on windows downgrade torch to 2.3.1 while on linux not, can I disable awq tests for windows? |
What does this PR do?
Add support for AWQ models after this support was added by OpenVINO in 2024.6 (openvinotoolkit/openvino#27859)
Patching for 16bit can be also used for GPTQ and AWQ models to support FP16/BF16 regions in the model.
Fixes # (issue)
Before submitting